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Permutations with fixed pattern densities 
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Peter Winkled 


Abstract 

We study scaling limits of random permutations (“permutons”) 
constrained by having fixed densities of a finite number of patterns. 
We show that the limit shapes are determined by maximizing entropy 
over permutons with those constraints. In particular, we compute 
(exactly or numerically) the limit shapes with fixed 12 density, with 
fixed 12 and 123 densities, with fixed 12 density and the sum of 123 
and 213 densities, and with fixed 123 and 321 densities. In the last 
case we explore a particular phase transition. To obtain our results, we 
also provide a description of permutons using a dynamic construction. 


1 Introduction 

We study pattern densities in permutations. A pattern t e Sk in a permuta¬ 
tion a e S n (with k < n) is a /e-element subset of indices 1 < i\ < • • • < ik < 
n whose image under a has the same order as that under r. For example the 
first three indices in the permutation 4312 have pattern 321. The density of 
t G Sk in a G S n is (”) times the number of such subsets of indices. 
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Pattern avoidance in permutations is a well-studied and rich area of com¬ 
binatorics; see ra for the history and the current state of the subject. Less 
studied is the problem of determining the range of possible densities of pat¬ 
terns, and the “typical shape” of permutations with constrained densities 
of (a fixed set of) patterns. We undertake such a study here. Specifically, 
we consider the densities of one or more patterns and consider the feasible 
region, or phase space F, of possible values of densities of the chosen patterns 
for permutations in S n in the limit of large n. For densities in the interior of 
F we study the shape of a typical permutation with those densities, again in 
the large n limit. We note that the typical shape of pattern-avoiding permu¬ 
tations (which necessarily lie on the boundary of the feasible region F) has 
also recently been investigated jU 0, HJ ESI 2E, EZ]. 

To deal with these asymptotic questions we show that the size of our 
target sets of constrained permutations can be estimated by maximizing a 
certain function over limit objects called permutons. Furthermore when—as 
appears to be usually the case—the maximizing permuton is unique, prop¬ 
erties of most permutations in the class can then be deduced from it. After 
setting up our general framework we work out several examples. To give 
further details we need some notation. 

To a permutation tt e S n one can associate a probability measure on 
[0, l ] 2 as follows. Divide [0, l ] 2 into an n x n grid of squares of size 1/n x 1 jn. 
Define the density of 7 ^ on the square in the ?’th row and jth column to be 
the constant n if n(i) = j and 0 otherwise. In other words, 7 ^ is a geometric 
representation of the permutation matrix of n. 

Define a permuton to be a probability measure 7 on [0, l ] 2 with uniform 
marginals: 

7 ([a, b] x [0,1]) = b — a = 7 QO, 1] x [a, b ]), for all 0 < a < b < 1. (1) 

Note that 7 ^- is a permuton for any permutation n G S n . Permutons were 
introduced in [T5J |TB] with a different but equivalent definition; the measure 
theoretic view of large permutations can be traced to [;29JJ and was used in 
P2j|23] as an analytic representation of permutation limits equivalent to that 
used in nans]; the term “permuton” first appeared, we believe, in [12]. 

Let T be the space of permutons. There is a natural topology on T, 
the weak topology on probability measures, which can equivalently be de¬ 
fined as the metric topology defined by the metric du given by dn( 71 , 72 ) = 
max [71 (i?) — 72 (A)!, where R ranges over aligned rectangles in [0, l] 2 . This 
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topology is also the same as that given by the L°° metric on the cumulative 
distribution functions Gi(x,y ) = 7i([0,x] x [0,2/]). We say that a sequence 
of permutations 7T n with 7r n G S n converges as n —> oo if the associated 
permutons converge in the above sense. 

Extending the definition above, given a permuton 7 the pattern density 
of t in 7 , denoted p T ( 7 ), is by definition the probability that, when k points 
are selected independently from 7 and their ^-coordinates are ordered, the 
permutation induced by their ^-coordinates is r. For example, for 7 with 
probability density g(x,y)dxdy, the density of pattern 12 G S 2 in 7 is 

£ 12 ( 7 ) = 2/ / g(x 1 ,yi)g(x 2 ,y 2 )dx 1 dy 1 dx 2 dy 2 . (2) 

Jx i<® 2 e[0,l] J 2/1 <y 2 £[0,1] 

It follows from results of P~5T fib] that two permutons are equal if they 
have the same pattern densities (for all k ). 

The notion of pattern density for permutons generalizes the notion for 
permutations. Note however that the density of a pattern a G Sk in a 
permutation r G S n (defined to be the number of copies of a in r, divided 
by (“)) will not generally be equal to the density of a in the permuton y T ; 
equality will only hold in the limit of large n. 


1.1 Results 

Theorem [I] below (restated from the somewhat different form in Trashorras, 
[36]) is a large deviations theorem for permutons: it describes explicitly how 
many large permutations lie near a given permuton. The statement is es¬ 
sentially that the number of permutations in S n lying near a permuton 7 

is 

n \ e ( H ( 7)+o(l))n, ( 3 ) 

where H( 7 ) is the “permuton entropy” (defined below). 

We use this large deviations theorem to prove Theorem [2] which describes 
both the number and (when uniqueness holds) limit shape of permutations 
in which a finite number of pattern densities have been fixed. The theorem 
is a variational principle: it shows that the number of such permutations is 
determined by the permuton entropy maximized over the set of permuton(s) 
having those fixed pattern densities. 

Another construction we use replaces permutons by families of insertion 
measures {/^}te[o,i], which is analogous to building a permutation by induc¬ 
tively inserting one element at a time into a growing list: for each i G [n] 
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one inserts i into a random location in the permuted list of the first i — 1 
elements. This construction is used to describe explicitly the entropy maxi¬ 
mizing permutons with fixed densities of patterns of type **■■■*? (here each 

* represents an element not exceeding the length of the pattern, for example, 

* * 2 represents the union of the patterns 132 and 312). We prove that for 
this family of patterns the maximizing permutons are analytic, the entropy 
function as a function of the constraints is analytic and strictly concave, and 
the optimal permutons are unique and have analytic probability densities. 

The most basic example to which we apply our results, the entropy- 
maximizing permuton for a fixed density p 12 of 12 patterns, has probability 
density 


, s _ r(l ~ e r ) _ 

9\ x ^y) r e T(l- x -y)/2 _ e r(x-y- 1)/2 _ e r(y-x-l)/2 e r(x+y-l)/2\2 

where r is an explicit function of p 12 . See Figure [l] 



Figure 1: The permuton with fixed density p of pattern 12, shown for p = 
.2, .4, .8. 

While maximizing permutons can be shown to satisfy certain explicit 
PDEs (see Section |8j) , they can also exhibit a very diverse set of behaviors. 
Even in one of the simplest cases, that of fixed density of the two patterns 
12 and 123, the variety of shapes of permutons (and therefore of the approx¬ 
imating permutations) is remarkable: see Figure [Tj In this case we prove 
that the feasible region of densities is the so-called “scalloped triangle” of 
Razborov |33j [3HJ which also describes the space of feasible densities for 
edges and triangles in the graphon model. 
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Another example which has been studied recently unmzms! is the case 
of the two patterns 123 and 321. In this case we describe a phase transition 
in the feasible region, where the maximizing permuton changes abruptly. 

The variational principle can easily be extended to analyze other con¬ 
straints that are continuous in the permuton topology. For constraints that 
are not continuous, for example the number of cycles of a fixed size, one can 
analyze an analogous “weak” characteristic, which is continuous, by applying 
the characteristic to patterns. For example, while the number of fixed points 
of a permuton is not well-defined, we can compute the expected number of 
fixed points for the permutation in S n obtained by choosing n points inde¬ 
pendently from the permuton, and analyze this quantity in the large n limit. 
This computation will be discussed in a subsequent paper m, the result is 
that the expected weak number of fixed points is 

1 

g(x, x ) da; 

when g has a continuous density. Similar expressions hold for cycles of other 
lengths. 

1.2 Analogies with graphons 

For those who are familiar with variational principles for dense graphs [8j 
cairn eh we note the following differences between the graph case and the 
permutation case (see [23] for background on graph asymptotics): 

1. Although permutons serve the same purpose for permutations that 
graphons serve for graphs, and (being defined on [0, l] 2 ) are superfi¬ 
cially similar, they are measures (not symmetric functions) and repre¬ 
sent permutations in a different way. (One can associate a graphon with 
a limit of permutations, via comparability graphs of two-dimensional 
posets, but these have trivial entropy in the Chatterjee-Varadhan sense 
(8j and we do not consider them here.) 

2. The classes of constrained (dense) graphs considered in [p have size 
about e cn ~, n being the number of vertices and the (nonnegative) con¬ 
stant c being the target of study. Classes of permutations in S n are of 
course of size at most n\ ~ e n(iogn-i) | )U (- f| ie constrained ones we con¬ 
sider here have size of order not e cnlogn for c G (0,1), as one might at 
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first expect, but instead e nlogri n+cn where c G [—oo, 0 ] is the quantity 
of interest. 

3. The “entropy” function, i.e., the function of the limit structure to be 
maximized, is bounded for graphons but unbounded for permutons. 
This complicates the analysis for permutations. 

4. The limit structures that maximize the entropy function tend, in the 
graph case, to be combinatorial objects: step-graphons correspond¬ 
ing to what Radin, Ren and Sadun call “multipodal” graphs [32]. In 
contrast, maximizing permutons at interior points of feasible regions 
seem always to be smooth measures with analytic densities. Although 
they are more complicated than maximizing graphons, these limit ob¬ 
jects are more suitable for classical variational analysis, e.g., differential 
equations of the Euler-Lagrange type. 

2 Variational principle 

For convenience, we denote the unit square [0, l ] 2 by Q. 

Let 7 be a permuton with density g defined almost everywhere. We 
compute the permutation entropy H( 7 ) of 7 as follows: 

H(i)= / -g(x,y)logg(x,y)dxdy (4) 

Jq 

where “OlogO” is taken as zero. Then H is finite whenever g is bounded 
(and sometimes when it is not). In particular for any a £ S n , we have 
H(^ a ) = —n{n\ogn/n 2 ) = — logn and therefore H( 7 ^) —» —00 for any 
sequence of increasingly large permutations even though H{ limy^) may be 
finite. Note that H is zero on the uniform permuton (where g(x,y) = 1) and 
negative (sometimes — 00 ) on all other permutons, since the function zlogz 
is concave downward. If 7 has no density, we define H( 7 ) = — 00 . 

We use the following large deviations principle, first stated in a somewhat 
different form by Trashorras (Theorem 1 in [36]); see also Theorem 4.1 in [28]. 
In Section [Tl] we give an alternative proof. 

Theorem 1 ([36]). Let A be a set of permutons, A n the set of permutations 
7 r G S n with 7^ G A. Then: 
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1. If A is closed, 


(5) 


lim — log < sup H{ 7 ); 

n-> 0 o n 711 7 gA 

2. If A is open, 

lim — log > sup 7 /( 7 ). ( 6 ) 

n-^-oo n n\ -ygA 

To make a connection with our applications to large constrained permu¬ 
tations, fix some finite set V = {7Ti ,..., 717} of patterns. Let a = (an,. .., a k) 
be a vector of desired pattern densities. We then define two sets of permn- 
tons: 

A a ' e = {7 G T | \p nj { 7 ) — aj < £ for each 1 < j < k} (7) 

and 

A" = {7 G T | p nj ( 7 ) = aj for each 1 < j < k}. ( 8 ) 

With that notation, and the understanding that A“ ,E = A a,£ fl 7 (S n ), 
where 7 (a) = 7 a as before, our first main result is: 

Theorem 2. 

1 |A“ ,£ | 

lim lim — log——— = max 7 /( 7 ). 

e 4 ,o ih 00 n n\ 7 eA“ 

The value max 76 A“ H{ 7 ) (which is guaranteed by the theorem to exist, 
but may be — 00 ) will be called the constrained entropy and denoted by s(a). 
In Section [4] we will prove Theorem [2] 

Theorem [2] puts us in a position to try to describe and enumerate permu¬ 
tations with some given pattern densities. It does not, of course, guarantee 
that there is just one 7 G A" that maximizes H( 7 ), nor that there is one 
with finite entropy. As we shall see it seems to be the case that interior 
points in feasible regions for pattern densities do have permutons with finite 
entropy, and usually just one optimizer. Points on the boundary of a feasible 
region (e.g., pattern-avoiding permutations) often have only singular permu¬ 
tons, and since the latter always have entropy — 00 , Theorem [2] will not be 
of direct use there. 


3 Feasible regions and entropy optimizers 

We collect here some general facts about feasible regions and entropy opti¬ 
mizers, making use of concavity of entropy and the “heat flow on permutons”. 
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3.1 Heat flow on permutons 

The heat flow is a continuous flow on the space of permutons with the prop¬ 
erty that for any permuton p = p 0 and any positive time t > 0, /i t has 
analytic density (and thus finite entropy). 

The flow after time t is given by the action of the heat operator e tA 
where A is the Laplacian on the square with reflecting boundary conditions. 
One can describe the flow concretely as follows. First, one can describe a 
permuton /j by its characteristic function g(u,v ) = K[e l( - ux+vy ' s ]. In fact since 
we are on the unit square we can use instead the discrete Fourier cosine series 

g(j, k ) = E[cos(7tjx) cos(7 rky)} 


with j, k > 0 . 

The operator e tA acts on the coefficients by multiplication by e _(j +fc ' >t : 

9 tU,k) = 

Note that the heat flow preserves the marginals, that is g t (j , 0) = Sj = go(j, 0) 
and g t (0, k) = S k = g 0 {0,k). 

For any t > 0 the Fourier coefficients g t (j,k ) then decay exponentially 
quickly so that ()t (j, k) are the Fourier coefficients of a measure with analytic 
density. 

3.2 Elementary consequences for feasible regions 

Let R be the feasible region for permutons with some finite set of pattern 
densities. Let Rm be the subset of R consisting of points representable by 
an analytic permuton with entropy at least —M, and i?* those representable 
by permutons with finite entropy. 

Entropy is upper-semicontinuous on R (just as it is on the space V of all 
permutons). So Rm is closed. 

Theorem 3. R* is dense in R. 

Proof. Any permuton 7 may be perturbed (by the heat flow for small time, 
thus moving densities only a small amount) to achieve an analytic permuton 
with finite entropy. □ 

Theorem 4. Let C be a topological sphere in Rm- Then the interior of C is 
contained in Rm- 



Proof. Consider the space of all permutons obtainable as convex combina¬ 
tions of the entropy-maximizing permutons on C. This set is convex and 
contains a topological disk whose boundary is the set of entropy-maximizing 
permutons on C, parameterized in order. The image of this disk in R pro¬ 
vides a homotopy of C to a point and thus contains the interior of C. Since 
the entropy function is concave, the entropies of the points in the space of 
convex combinations are all at least —M. □ 

We now give several corollaries of Theorem [4j 

Corollary 5. R contains no local minimum of the entropy, nor any local 
maximum that is not a global maximum of the entropy. 

Proof. A local minimum is the minimum in a disk around it. Let C as in the 
proof of Theorem [4]be the boundary of this disk. Concavity of entropy implies 
that the minimum of the entropy on the disk occurs on C, a contradiction. 

For the second statement, convex combinations connect two local maxima 
with different values by a path in R whose entropies are lower bounded by 
the minimum of the two values, a contradiction. □ 

Corollary 6. If R is the feasible region for a single density, then it is an 
interval on the interior of which entropy is finite and concave. 

Note that for any pattern tt there is a pernruton which has zero density for 
that pattern (either the identity permuton or the ‘anti-identity’ permuton). 
The maximal n-density permuton(s) are not known in general, although a 
lower bound on the maximal density is obtained from the permuton 7 ^. 

Corollary 7. If R is the feasible region for two densities, then Rm is simply 
connected and R and i?* are connected. 

4 Proof of Theorem [2] 

Since we will be approximating H by Riemann sums, it is useful to define, 
for any permuton 7 and any positive integer m, an approximating “step- 
permuton” 7” 1 as follows. Fixing m, denote by Qij the half-open square 
((i—l)/m,i/m\ x for each 1 < i,j < m, we want 7 ™ to be 

uniform on Q tl with 7 m {Qif) = 7 {Qij)- In terms of the density g m of 7 " 1 , we 
have g m (x, y) = m 2 ^(Q ij ) for all (x,y) G Q iy 

To prove Theorem [2] we use the following result. 
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Proposition 8 . For any permuton 7 , lim„ woo with H( r ) m ) 

diverging downward when H( 7 ) = — 00 . 

In what follows we will, in order to increase readability, write 

~g(x,y) log g(x,y)dxdy (9) 

as just Jq —glogg. Also for the sake of readability, we will for this section 
only state results in terms of glogg rather than —glogg; this avoids clutter 
caused by a multitude of absolute values and negations. Eventually, however, 
we will need to deal with an entropy function H('y) = f q —^7 log <7 that takes 
values in [— 00 , 0 ]. 

Define 

9ij = m 2r y(Qij )• (10) 

We wish to show that the Riemann sum 

^2 9ijlog gi j, (11) 

0<i,j<m 

which we denote by R m (y), approaches J Q g log g when 7 is absolutely con¬ 
tinuous with respect to Lebesgue measure, i.e., when the density g exists a.e., 
and otherwise diverges to 00 . There are thus three cases: 

1 . g exists and f Q g log g < 00 ; 

2 . g exists but glogg is not integrable, i.e., its integral is 00 ; 

3. 7 is singular. 

Let A(t) = {(x,y) e Q : g(x,y)logg(x,y) > t}. 

In the first case, we have that lim sup f A ^ g log g = 0, and since g log g > t 
on A(t), we have lim sup \A(t)\t = 0 where |A| denotes the Lebesgue measure 
of A C Q. (We need not concern ourselves with large negative values, since 
the function xlogx is bounded below by —1/e.) 

In the second case, we have the opposite, i.e., for some e > 0 and any s 
there is a t > s with t\A(t)\ > e. 

In the third case, we have a set A C Q with 7 (A) > 0 but |A| =0. 

In the proof that follows we do not use the fact that 7 has uniform 
marginals, or that it is normalized to have 7 (Q) = 1. Thus we restate 
Proposition [ 8 ] in greater generality: 
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Proposition 9. Let 7 be a finite measure on Q = [0, l] 2 and R m = R m ( 7 ). 
Then: 

1. If 7 is absolutely continuous with density g, and glogg is integrable, 
then lim^oo R m = f Q g log g. 

2. If'y is absolutely continuous with density g, and glogg is not integrable, 
then lim^j^oo R m = 00 . 

3. If 7 is singular, then 1 iny^,*, R, m = 00 . 

Proof. We begin with the hrst case, where we need to show that for any 
£ > 0 , there is an m 0 such that for m > m 0 , 

f 1 m 

/ 9 l°g 9 ~ ~2 9i i log Sij < £ • ( 12 ) 

*,y=o 

Note that since x log x is convex, the quantity on the left cannot be negative. 

Lemma 10. Let e > 0 be fixed and small. Then there are S > 0 and s with 
the following properties: 

L IL (s )^ lo g^l < 52 /4; 

2. |A(s)| < <5 2 /4; 

3. for any u, v G [0, s + 1], if \u — v\ <5 then u logu — v logu| < e/A; 

4 . for any B C Q, if \B\ < 25 then f B |g logg| < e/A. 

Proof. By Lebesgue integrability of glogg, we can immediately choose ho 
such that any 5 < 5o will satisfy the fourth property. 

We now choose Si so that f A ^ glogg < ho/4, and t\A(t)\ < 1 for all 
t > s 1 . Since [0,si] is compact we may choose hi < h 0 such that for any 
u,v G [0,si + 1], | u — u| < hi implies \ulogu — ulogu| < e/A. We are done 
if \A( Sl )\ < h 2 /4 but since hi depends on si, it is not immediately clear that 
we can achieve this. However, we know that since ^ulogw = 1 + logw, the 
dependence of hi on si is only logarithmic, while |H(si)| descends at least as 
fast as 1 /si. 
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So we take k = |"log(<5/2)/log(e/2)~| and let S = Si/k, s = s*. Then 
u,v e [0, s + 1] and \u — v\ <5 implies |ulogw — nlogn| < e/4, and 

[ glogg<([ glogg] < (e/2) 21og(6/2)/log(e/2) = h 2 / 4 (13) 

JA(s) \JA{s i ) J 

as desired. Since u log u > u > 1 for u > e, we get |4L(s)| < 5 2 / 4. □ 

Henceforth s and S will be fixed, satisfying the conditions of Lemma [lOj 
Since g is measurable we can find a subset C C Q with \C\ = |H(s)| < 5 2 /4 
such that g, and thus also glogg, is continuous on Q\C. Since f B glogg 
is maximized by B — H(s) for sets B with B = |H(s)|, \J c glogg\ < 
J A ( s ) g log g, so I Ia(s)uc kl^°Sg\ < 5 2 /2. We can then find an open set A 
containing H(s) U C with |H| and J A glogg both bounded by <5 2 . 

We now invoke the Tietze Extension Theorem to choose a continuous 
/ : Q —» M with f(x,y ) = g(x,y ) on Q \ A, and / log / < s on all of Q. 
Since / is continuous and bounded, / and / log / are Riemann integrable. 
Let fij be the mean value of / over Q t] , i.e., 

fij =m 2 f f . (14) 

J Qij 

Since, on any Q t j, inf/log/ < log < sup/log/, we can choose m o 
such that m > m o implies 



m- 


^ fij ^g fij 




< e/4 


We already have 


glogg - 


>Q 


< 


[ /log/I = | 

/ 2log£- 

[ /log/| 

JQ 

/» 

J A 

JA 

1 j glogg] - | 

/ /log/| < 2 6 2 < e/4. 

JA 


Thus, to get (12) from (15) and (16), it suffices to bound 


1 

m 2 


X/hjlogfhj 

ij 


rtv 




(15) 


(16) 


( 17 ) 
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by e/ 2 . 

Fixing m > mo, call the pair (i,j), and its corresponding square Qij, 
“good” if |Qij nA\< 8/ (2m 2 ). The number of bad (i.e., non-good) squares 
cannot exceed 2 5m 2 , else |v4| > 28m 2 8/(2m 2 ) = S 2 . 

For the good squares, we have 


= m 2 

[ {9- f) 

< m 2 

[ 2 g 

< 2(8/2) = 8 

(18) 


J QijClA 


J QijHA 




with fij < s, thus fij and both in [0,s + 1]. It follows that 

1 9ij log Qij - fij log | < e/4 (19) 

and therefore the “good” part of the Riemann sum discrepancy, namely 


m z 


X! (.% log 9 a - fij log.// 

good ij 


( 20 ) 


is itself bounded by e/4. 

Let Q' be the union of the bad squares, so \Q'\ < m 2 28/(2m 2 ) = 28; then 
by (15) and convexity of u log u, 


m* 


bad ij 


fij log fij 

< 2 

/ #log£ 



JQ' 


< 2(e/8) = e/4 (21) 


and we are done with the case where g log g is integrable. 

Suppose g exists but glogg is not integrable; we wish to show that for 
any M, there is an mi such that m > m\ implies — 2 X] 9ij log 9ij > M. 

For t > 1, define the function g* by g t (x,y ) = g(x,y ) when g(x,y ) < t, 
i.e. when (x,y) A(t), otherwise g t (x,y) = 0. Then R g f log g l —> 00 as 

t —> 00 , so we may take t so that f‘ g l \ogg t > M + 1. Let 7 * be the (finite) 
measure on Q for which g l is the density. Since g l is bounded (by t), g l log g l 
is integrable and we may apply the first part of Proposition [9] to get an m\ 
so that m > m\ implies that R m ( 7 *) > M. 

Since t > 1 , glogg > g t logg t everywhere and hence, for every m, 
R m ( 7 *) < R m ( 7 ). It follows that R m ( 7 ) > M for m > m\ and this case 
is done. 


Finally, suppose 7 is singular and let A be a set of Lebesgue measure zero 
for which 7 (A) = a > 0. 
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Lemma 11. For any £ > 0 there is an m 2 such that m > m 2 implies that 
there are em 2 squares of the m x m grid that cover at least half the 7 -measure 
of A. 


Proof. Note first that if B is an open disk in Q of radius at most 5, then 
for m > 1/(25), then we can cover B with cells of an m x m grid of total 
area at most 645 2 . The reason is that such a disk cannot contain more than 
|~25/(l/m)] 2 < (45m ) 2 grid vertices, each of which can be a corner of at most 
four cells that intersect the disk. Thus, rather conservatively, the total area 
of the cells that intersect the disk is bounded by (4/m 2 ) • (45m ) 2 = 645 2 . It 
follows that as long as a disk has radius at least l/( 2 m), it costs at most a 
factor of 64/7T to cover it with grid cells. 

Now cover A with open disks of area summing to at most 7re/64. Let b n 
be the 7 -measures of the union of the disks of radii at least 1 / 2 n. Choose 
m 2 such that b m2 > a /2 to get the desired result. □ 


Let M be given and use Lemma 11 to find m 2 such that for any m > m 2 , 
there is a set / C { 1 ,..., m } 2 of size at most 5m 2 such that 7((J/ Qij) > a/2, 
where = ((i—l)/m,i/m\ x ((j— l)/m, j/m] as before and 5 is a small 
positive quantity depending on M and a, to be specified later. Then 


Rm( 7 ) 


= E-^ lo g^ 

m z 

v 

> — 1 /eH- -5m 2 g logg 


-1/e + 6jlog 9 


( 22 ) 


where g is the mean value of g l3 over (i,j) G /, the last inequality following 
from the convexity of u logu. The —1/e term is needed to account for possible 
negative values of glogg. 

But )T ) T g,j = m 2 'y({J I Q^) > m 2 a/ 2, so g > (m 2 a/2)/(5m 2 ) = a/(25). 
Consequently 


Bmip') ^ 


1 

e 


s Ts' og 


a 

25 


1 

e 


a , 

+ 2 log 


a 

25 ' 


(23) 


Taking 



(24) 


gives R m ( 7 ) > M as required, and the proof of Proposition [ 8 ] is complete. D 
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We now prove Theorem [2j 

Proof. The set A a,£ of permutons under consideration consists of those for 
which certain pattern densities are close to values in the vector a. Note first 
that since the density function p(n, •) is continuous in the topology of T, A" 
is closed and by compactness //(y) takes a maximum value on A“. 

Again by continuity of p(tt, •), A a,£ is an open set and we have from the 
second statement of Theorem [l] that for any e, 

1 |A"’ £ | 

lim — log——— > max HM > maxiTYy) (25) 

n—>oo n n\ 7SA“>s 7 sA“ 

from which we deduce that 

1 |A" ,£ | 

lim lim — log—^ — max #( 7 ). (26) 

e^O n—>00 n n\ 7SA“ 

To get the reverse inequality, fix a 7 G A“ maximizing H{ 7 ). Let 8 > 0; 
since H is upper semi-continuous and A" is closed, we can find an s' > 0 such 
that no permuton 7 ' within distance s' of A Q has H{ 7 ') > 7/(y) + 8 . But 
again since p{ir, •) is continuous, for small enough e, every 7 ' G A“ ,£ is indeed 
within distance s' of A“. Let A' be the (closed) set of permutons 7 ' satisfying 
p( 7 Tj, 7 ') < e; then, using the first statement of Theorem [lj we have thus 

lim — log < H ( 7 ) + 8 (27) 

n —>00 n n\ 

and since such a statement holds for arbitrary 8 > 0 , the result follows. □ 

5 Insertion measures 

A permuton 7 can be described by a family of insertion measures. This 
description will be useful for constructing concrete examples, in particular 
for the so-called star models, which are discussed in Sections [ 6 ] and [7] below. 

The insertion measures are a family of probability measures 
with measure u x supported on [0,x]. This family is a continuum version of 
the process of building a random permutation on [n] by, for each i, insert¬ 
ing i at a random location in the permutation formed from { 1 ,... ,i — 1 }. 
Any permutation measure can be built this way. We describe here how any 
permuton can be built from a family of independent insertion measures, and 
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conversely, how every permuton defines a unique family of independent in¬ 
sertion measures. 

We first describe how to reconstruct the insertion measures from the 
permuton 7 . Let Y x e [0,1] be the random variable with law 7 |{ x }x[o,i]- Let 
Z x G [0, x] be the random variable (with law v x ) giving the location of the 
insertion of x (at time x), and let F(x, •) be its CDF. Then 

F(x, y) = Pr (Z x < y) = Pr (Y x < y) = G x (x, y) (28) 

where y is defined by G(x,y ) = y. 

More succinctly, we have 


F(x,G(x,y )) = G x (x,y). 


(29) 


Conversely, given the insertion measures, equation (29) is a differential 
equation for G. Concretely, after we insert x 0 at location X(x 0 ) = Z xo , the 


image flows under future insertions according to the (deterministic) evolution 


-^X(x) = F x (X(x)), X(x 0 ) = Z X0 . (30) 

If we let [cc,i] denote the flow up until time 1 , then the permuton is the 
push-forward under T of u x : 


It (^ [x,l] )* (^x) ■ 


(31) 


A more geometric way to see this correspondence is as follows. Project 
the graph of G in M 3 onto the xz-plane; the image of the curves G([ 0 , 1 ] x {y}) 
are the flow lines of the vector field (30). The divergence of the flow lines at 
(x,y) is f(x,y ), the density associated with F(x,y ). 

The permuton entropy can be computed from the entropy of the insertion 
measures as follows. 

Lemma 12. 


H( 7 ) = 



—f(x, y) log (xf(x, y))dy d.c 


(32) 


0 J 0 


Proof. Differentiating (29) with respect to y gives 


/(x, G(x, y))G y (x , y) = g(x, y). 


(33) 
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Thus the RHS of (32) becomes 



1 " g (x,y) xg(x,y) 
io Jo G,{x,y) S G 9 (x,y) V ' 

Substituting y = G(x,y ) with d y = G y (x,y)dy we have 
Six, y) log dy Ax = H( 7 ) - 


( 34 ) 



0 Jo 


1 pi 



G y (x,y) 
g(x,y) log Gy(x,y) dy dx. 



0 Jo 


g(x, y) log x dy dx 
(35) 


0 JO 


Integrating over y the first integral on the RHS is 


logx dx = 1, 


(36) 


while the second integral is 


1 pi 



0 Jo 


■^■(GylOgGy - Gy)dxdy = ^ ~ 1 ) rfi/ = ~ 1 , 


(37) 


since G(l,y) = y and G{d,y) = 0. So those two integrals cancel. 


□ 


6 12 patterns 

The number of occurrences k(n) of the pattern 12 in a permutation of S n 
has a simple generating function: 


= lit 1 + * + • • • + x j ) = (38) 

ir&S n j=l i=0 

One can see this by building up a permutation by insertions: when i is 
inserted into the list of {1,... ,i — 1}, the number of 12 patterns created is 
exactly one less than the position of i in that list. 

Theorem [2] suggests that to sample a permutation with a fixed density 
p G [0,1] of occurrences of pattern 12, we should choose x in the above expres¬ 
sion so that the monomial C^ pn 2 / 2 yx^ pn ^ is the maximal one, and then use 
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the insertion probability measures which are (truncated) geometric random 
variables with rate x. 

Here x is determined as a function of p by Legendre duality (see below 
for an exact formula). Let r be defined by e~ r = x. In the limit of large n, 
the truncated geometric insertion densities converge to truncated exponential 
densities 


f{x,y) = 


re 


-ry 


40, X 


(a). 


(39) 


1 — e~ 

We can reconstruct the permuton from these insertion densities as follows. 
Note that the CDF of the insertion measure is 

1 - P~ r V 


F(x,y) = 


(40) 


We need to solve the ODE (29), which in this case (to simplify notation we 
changed y to y ) is 

1 — e~ rG ( x ’ y ' ) d G(x,y) 

1 - e~ rx ~ dx 

This can be rewritten as 

dx dG 


X _ e -rx x — e _rG C’D ' 

Integrating both sides and solving for G gives the CDF 


(41) 

(42) 


GM J log (i + !£Li!Hzi! 


which has density 

g(x,y) = 


r(l — e r ) 


(43) 


(44) 


f e r(l-x-y)/2 _ e r(x-y-l)/2 _ e r{y-x- 1)/2 _|_ e r(x+y-l)/2^ 2 ' 

See Figure [l] for some examples for varying p. 

The permuton entropy of this permuton is obtained from (32), and as a 
function of r it is, using the dilogarithm, 


H(r) = 


2Li 2 (e 1 ) 7 r 


+ --2 log (1 - e r ) + log (e r - 1) - log(r) + 2. (45) 

Sr 


The density p of 12 patterns is the integral of the expectation of /: 

.2 


r (r - 2 log (1 - e r ) + 2) - 2Li 2 (e r ) n 
Pv ) o ' 


3 r 2 ’ 


(46) 


see Figure [2] for p as a function of r. 

Figure [3] depicts the entropy as a function of p. 
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Figure 2: 12 density as function of r. 



Figure 3: Entropy as function of 12 density. 
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7 Star models 


In this section, we study the density of patterns of the form **•••*&, which 
we refer to as star models. 

Equation (38) gives the generating function for occurrences of pattern 12. 
For a permutation n let k\ = k\{n) be the number of 12 patterns. Let k 2 
be the number of **3 patterns, that is, patterns of the form 123 or 213. A 
similar argument to that giving (38) shows that the joint generating function 
for k\ and k 2 is 


£ cwv* = n (£ ry< i - 1)/2 '). (47) 

ki,k 2 j =1 V i=0 / 

More generally, letting k% be the number of patterns **2, that is, 132 or 
312, and k 4 be the number of **1 patterns, that is, 231 or 321. The joint 
generating function for these four types of patterns is 


^ C klMMM x k 'y k *z k *w k * = ;Q 

j =1 


E 

. 7=0 


Jti-i)yjU-kU-i- 1 )/ 2 


(48) 

One can similarly write down the joint generating function for all patterns of 
type . *i, with a string of some number k of stars followed by some i in 
[k + 1], (Note that with this notation, 12 patterns are *2 patterns.) These 
constitute a significant generalization of the Mallows model discussed in |3Sj • 


7.1 The *2/ **3 model 

By way of illustration, let us consider the simplest case of *2 (that is, 12) 
and **3. 

Theorem 13. The feasible region for (p* 2 ,p** 3 ) is the region bounded below 
by the parameterized curve 


(2 1 - t 2 , 3 1 2 - 2 t 3 ) te[0il] 

(49) 

and above by the parameterized curve 


(1 ~ t 2 ,l ~ L 3 )te[o,i]- 

(50) 
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Figure 4: Feasible region for (p* 2 ,P** 3 )- 
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One can show that the permutons on the boundaries are unique and 
supported on line segments of slopes ±1, and are as indicated in Figure |4j 

Proof. While this can be proved directly from the generating function ( |47| ), 
we give a simpler proof using the insertion density procedure. During the in¬ 
sertion process let I be the fractional number of 12 patterns in the partial 
permutation constructed up to time x. We want to stress the normalization 
factor here: the number of 12 patterns in an n-permutation constructed up 
to time x should be thought of as Ii 2 (x)n 2 , in particular, J 12 (a;) = pi 2 /2. 
So, we get that /i 2 (a;) = f* Y t df, where Y t is the random variable giving the 
location of the insertion of t. By the law of large numbers we can replace 
Y t here by its mean value, that is, Iufx + dx) — h 2 {x) is a sum (really, an 
integral) of the independent insertions during time in [x, x + dx], which have 
mean Y t , so 

I'a(t) =E[V,]. 

Let /** 3 (x) likewise be the fraction of **3 patterns created by time x. We 
have x 

j« 3 (x)= / E[r t ] 2 /2 dt. (51) 

Jo 

Note that I** 3 (x) = (p 123 + P 2 i?>)/§- 



Figure 5: Permutons with (p* 2 ,p** 3 ) = (.5, .2), and (.5, .53) respectively. 
Let us fix pi 2 = 2 • /i 2 (l). To maximize /** 3 (a;), we need to maximize 


(I[. 2 (t )) 2 dt subject to 




(52) 
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This is achieved by making P 12 (t) either zero or maximal. Since I[ 2 (t) < t, we 
can achieve this by inserting points at the beginning for as long as possible 
and then inserting points at the end, that is, Y t — 0 up to t — a and then 
Y t = t for t G [a, 1]. The resulting permuton is then as shown in Figure [IJ 
on the square [0, a] 2 it is a descending diagonal and on the square [a, l] 2 it is 
an ascending diagonal. 

Likewise to minimize the above integral ( [52] ) we need to make the deriva¬ 
tives I[ 2 (t) as equal as possible. Since I[ 2 (t) < t, this involves setting 
I[ 2 (t) — t up to t — a and then having it constant after that. The re¬ 
sulting permuton is then as shown in Figure [4j on the square [0,a] 2 it is an 
ascending diagonal and on the square [a, l] 2 it is a descending diagonal. 

A short calculation now yields the algebraic form of the boundary curves. 

□ 


Using the insertion density procedure outlined earlier, we see that the 
permuton as a function of x, y has an explicit analytic density (which cannot, 
however, be written in terms of elementary functions). The permutons for 
some values of (p* 2 , p**fi) are shown in Figure [5j 

The entropy s(p* 2 ,p** 3 ) is plotted in Figure [bj It is strictly concave (see 
Theorem 14 below) and achieves its maximal value, zero, precisely at the 
point 1/2,1/3, the uniform measure. 


7.2 Concavity and analyticity of entropy for star mod¬ 
els 

Theorem 14. For a star model with a finite number of densities pi,... ,pk 
of patterns T \..., t*, respectively, the feasible region is convex and the entropy 
H(pi ,..., Pk) is strictly concave and analytic on the feasible region. For each 
pi,..., Pk in the interior of the feasible region there is a unique entropy- 
maximizing permuton with those densities, and this permuton has analytic 
probability density. 

One can construct examples where the feasible region is not strictly con¬ 
vex: e.g. in the case of densities **1 and **3. 

Proof. Let fc* be the length of the pattern r,;. 

The generating function for permutations of [n] counting patterns r* is 

Z n (x i, ...,x k )= X T ■ ■ ■ X T ( 53 ) 

TT eS n 
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1.0 



1.0 

Figure 6: The entropy function on the parameter space for pi 2 ,/W 


where n* = n^n) is the number of occurrences of pattern r* in 7r. The number 
of permutations with density p t of pattern r* is the sum of the coefficients of 
the terms x” 1 ... x£ fc with n* ~ ’’jrjPi- The entropy H(pi, ..., pk) is the log of 
this sum, minus logn! (and normalized by dividing by n). 


As discussed above, Z n can be written as a product generalizing (48) 
Write Xi = e°' 1 . Then the product expression for Z n is 


n J 

Z = TT 


nE- 

j =i *=o 


(54) 


where p(i,j) is a polynomial in i and j with coefficients that are linear in the 
Oj. For large n it is convenient to normalize the Oj by an appropriate power 
of n (and a combinatorial factor): write 


x, 


= e ai = exp (oii/n ki 1 ). 


(55) 


Writing i/n = t and j/n = x, the expression for log Z n is then a Riemann 
sum, once normalized: In the limit n —» oo the “normalized free energy” F 

is 

F := lim — (logZ n — logn!) = 

n->oo n j q 


•1 r 


log / e^’ x) dt 
o 


dx 


(56) 
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where p(t,x ) = p(nt,nx) + o(l) is a polynomial in t and x, independent of 
n, with coefficients which are linear functions of the coj. Explicitly we have 


k 

P{t,x) = 

i— 1 


t ri (x — t) Si 

Ti\Si\ 


(57) 


where r l + s* = hi — 1 and, if r t = *... then s, = kj — d*. 

We now show that F is concave as a function of the co;, by computing its 
Hessian matrix. We have 


OF 

dai 



r 1 /T r '(x-T) Si 

Jo \ r M- 


dx 


(58) 


where T e [0,a;] is the random variable with (unnormalized) density e^ t,x \ 
and (•) is the expectation with respect to this probability measure. 
Differentiating a second time we have 


d 2 F 

dajdati 


1 , T r i+rj ^ _ T y i+9j V / T n ( x _ T yi \ / T o ^ 


f .If .le.lo .1 

I f 1 J •O-i.Oj . 


Cov 


nls*! 

T n (x-T) Si T r i{x-T) s i 


r r s r 


rjlsp 


r r s r 


dx 


dx 


(59) 


where Cov is the covariance. 

The covariance matrix of a set of random variables with no linear de¬ 
pendencies is positive definite. Thus we see that the Hessian matrix is an 
integral of positive definite matrices and so is itself positive definite. This 
completes the proof of strict concavity of the free energy F. 

Since Z n is the (unnormalized) probability generating function, the vector 
of densities as a function of the {cxj} is obtained for each n by the gradient 
of the logarithm 


(pi,... ,p k ) = — Vlog Z n (ai,.. .,a k ). (60) 

n 

In the limit we can replace ^ V log Z n by VE; by strict concavity of F its 
gradient is injective, and surjective onto the interior of the feasible region. 
In particular there is a unique choice of cods for every choice of densities in 
the interior of the feasible region. Note that the cods determine the insertion 
measures (these are the measures with unnormalized density and thus 
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the permuton itself, proving uniqueness of the entropy maximizer. Analytic- 
ity of the probability density is a consequence of analyticity of the associated 


differential equation ( 29 ) 


By strict concavity of the free energy, we can relate the free energy to 
the entropy by the following standard argument. Referring back to the first 
paragraph of the proof, we have proven that, when ay = e ai , the generating 
function Z n concentrates its mass on the terms x ™ 1 


. Xfr k for which n* 


—n- 
fa! 


(where a* and pi are related by ( 60 )), in the sense that a fraction 1 — o(l) 
of the total mass of Z n is on these terms. The entropy is the log of the sum 
of the coefficients in front of these relevant terms. The entropy can thus be 
obtained from the free energy 1 log Z n /n\ by subtracting off / log(o:” 1 ... x^. k ). 
This shows that the entropy function H is the Legendre dual of F, that is, 


H(pi, .. ■, Pk) = max{F(ci'i,..., a k ) - } ayp*}. 
(“if 


( 61 ) 


Analyticity of F implies that H is both analytic and strictly concave. 

The “upper level sets” {p : H(p) > — M} of H are convex by concavity 
of H. Their union is the interior of the feasible region, which, being an 
increasing union of convex sets, is convex. □ 


8 PDEs for permutons 

For permutations with constraints on patterns of length 3 (or less) one can 
write explicit PDEs for the maximizers. It is possible that these may be 
used to show either analyticity or uniqueness, or both (although we have 
accomplished neither goal). 

Let us first redo the case of 12 -patterns, which we already worked out by 
another method in Section |6l 

8.1 Patterns 12 

The density of patterns 12 is given in ([ 2 ]). Consider the problem of maxi¬ 
mizing //(7) subject to the constraint 112(7) = P- This involves Ending a 
solution to the Euler-Lagrange equation 

d H + a d/12 = 0 ( 62 ) 

for some constant a, for all variations g g + eh fixing the marginals. 
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Given points (ai, 61 ), (a 2 , b 2 ) G [0, l ] 2 we can consider the change in H and 
/12 when we remove an infinitesimal mass 5 from (cp, b\) and (a 2 , b 2 ) and add 
it to locations (ai,fe 2 ) and (a 2 , 6 i). (Note that two measures with the same 
marginals are connected by convolutions of such operations.) The change in 
H to first order under such an operation is 6 times (letting S 0 (p) —plogp) 


S' 0 (g(ai, 60) - S' 0 (g(a 2 , b 2 )) + S' 0 (g(aM) + S' 0 (g(a 2 , 60) 

g{ a i,b\)g(a 2 , b 2 ) 


= log 


g(ai,b 2 )g(a 2 ,bi 


(63) 


The change in Ji 2 to first order is 6 times 
/ g(x 2 ,y 2 )dx 2 dy 2 + 


i,j =1 \Jo-i 


g(x 1 ,y 1 )dx 1 dy 1 


<x 2 Jbj<y 2 


' xi<a,i Jyi<bj 




(G( ai , bj) + (1 - a, - bj + G(fli, bj))). (64) 

i,j=1 


Differentiating (62) with respect to a — ai and b — bi, we find 
d d 

7T-TW log g(o, 6 ) + 2ag(a, b) = 0 . 


(65) 


One can check that the formula (44) satisfies this PDE. 


8.2 Patterns 123 

The density of patterns 123 is 

1123 ( 7 ) = 6 / g(x 1 ,y 1 )g(x 2 ,y 2 )g(x 3 ,y 3 )dx 1 dx 2 - ■ ■ dy 3 . 

Jxi<x 2 <X3, y± <y 2 <yz 

( 66 ) 

Under a similar perturbation as above the change in Ji 2 3 to first order is 5 
times 
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d/. 23 = 6]T(-l)’ : « ( [ 
i,j= 1 V a 


+ 


g{x 2 , V 2 )g(x 3 , y 3 )dx 2 dz 3 dy 2 d y 3 

i<x 2 <x 3 , bj<y 2 <y3 

g(xi, yi)g{x 3 , y 3 )dzi dx 3 dj/i dy 3 


' xi<ai<X3, yi<bj<y3 


+ 


Ui)g( x 2, 2/2)dxi dx2 dyi d?/2 • (67) 


V 


Jx 1 <x 2 <a i , yi<y 2 <bj 

The middle integral here is a product 


j 


/ ^(^i, J/i)dxi dyi / .9(^' 3 , y 3 )dx 3 dy 3 

Jxi<a,, yi<bj Ja t <x 3, fej<j/3 

= G(aj, 6j)(l - a* - bj + <7(a*, fy)). (68) 

Differentiating each of these three integrals with respect to both a = a\ 
and b — b\ (then only the i — j — 1 term survives) gives, for the first integral 

g(a,b) g(x 3 ,y 3 )dx 3 dy 3 = g(a,b)(l - a - b + G(a,b)), (69) 

J a<x 3, b<y3 

for the second integral 

g(a, 6)(1 — a — b + 2G(a, b)) + G x (a, b )(—1 + (7 y (a, 6)) 

+ G y (a, b)(— 1 + G x (a, b)), (70) 


and the third integral 


g(a,b) / ^(xi,7/i)dxid7/i = g(a,b)G(a,b). 

Jx\<a, b<yi 


(71) 


Summing, we get (changing a, 6 to x, y) 

{dl\ 2 z)xy — 12Gxj/(l — x — y + 2(7) + 12G x G y — 6G X — 6(7^. (72) 


Thus the Euler-Lagrange equation is 

(log G xy ) xy + 6a(2Gbi;(l — x — y + 2(7) + 2G x G y — G x — (7^) = 0. (73) 



This simplifies somewhat if we define K(x,y ) = 2 G(x,y) — x — y + 1. 
Then 

(logK xy ) xy + 3a (2 K xy K + K x K y — 1) = 0. (74) 

In a similar manner we can find a PDE for the permnton with fixed 
densities of other patterns of length 3. In fact one can proceed similarly for 
longer patterns, getting systems of PDEs, but the complexity grows with the 
length. 

9 The 12/123 model 

When we fix the density of patterns 12 and 123, the feasible region has a 
complicated structure, see Figure [T) 



Figure 7: The feasible region for p V2 versus P 123 , with corresponding permu- 
tons (computed numerically) at selected points. 
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Theorem 15. The feasible region for p 12 versus p 123 is the same as the 
feasible region of edges and triangles in the graphon model. 

Proof. Let 7Z denote the feasible region for pairs (pi 2 (t),£ 123 ( 7 )) consisting 
of the 12 density and 123 density of a permuton (equivalently, for the closure 
of the set of such pairs for finite permutations). 

Each permutation 7 t € S n determines a (two-dimensional) poset P\ on 
{1,... ,n} given by i -< j in P n iff i < j and 77 < 7 Tj. The comparability 
graph G(P) of a poset P links two points if they are comparable in P , that 
is, x ~ y if x -< y or y -< x. Then i ~ j in G(P n ) precisely when {i,j} 
constitutes an incidence of the pattern 12, and i ~ j ~ k ~ i when {i, j, k} 
constitutes an incidence of the pattern 123. Thus the 12 density of tx is equal 
to the edge density of G(P 7r ), and the 123 density of 7r is the triangle density 
of G(P 7 t )—that is, the probability that three random vertices induce the 
complete graph K 3 . This correspondence extends perfectly to limit objects, 
equating 12 and 123 densities of permutons to edge densities and triangle 
densities of graphons. 

The feasible region for edge and triangle densities of graphs (now, for 
graphons) has been studied for many years and was finally determined by 
Razborov [33j; we call it the “scalloped triangle” T. It follows from the above 
discussion that the feasibility region 1Z we seek for permutons is a subset of 
T, and it remains only to prove that 1Z is all of T. In fact we can realize T 
using only a rather simple two-parameter family of permutons. 

Let reals a, b satisfy 0 < a < 1 and 0 < b < a/2, and set k := [a/b\. 
Let us denote by 7 Qi & the permuton consisting of the following diagonal line 
segments, all of equal density: 

1. The segment y = 1 — x, for 0 < x < 1—a; 

2. The k segments y = (2j—l)b— 1+a— x for 1— a+(j—l)b < x < 1 —a+jb, 
for each j — 1,2,..., k] 

3. The remaining, rightmost segment y = 1 + kb—x, for 1 — a + kb < x < 1. 
(See Fig. [I] below.) 

We interpret ^ a ,0 as the permuton containing the segment y — 1 — x, 
for 0 < x < 1 —a, and the positive-slope diagonal from (1—a, 0) to (1,1—a); 
finally, 7 0) o is just the reverse diagonal from (0,1) to (1,0). These interpre¬ 
tations are consistent in the sense that and p\ 2 z{la,b) are continuous 
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Figure 8 : Support of the permutons 7 . 7,.2 and 7 . 7 , 0 . 

functions of a and b on the triangle 0 < a < 1, 0 < b < a/2. (I 11 fact, 7 a ,b is 
itself continuous in the topology of T, so all pattern densities are continuous.) 

It remains only to check that the comparability graphons corresponding 
to these permutons match extremal graphs in [33J as follows: 

• 7 a , 0 maps to the upper left boundary of T, with 70,0 going to the lower 
left corner while 7 li0 goes to the top; 

• la,a/2 goes to the bottom line, with 71 , 1/2 going to the lower right corner; 

• For l/(fc+2) < b < l/(fc+l), 71,6 goes to the kth lowest scallop, with 
7i,i/(fc+i) going to the bottom cusp of the scallop and 7 i,i/(fc+ 2 ) to the 
top. 

It follows that (a, b) 1 —> (pi 2 (la,b), Pi 23 (la,b)) maps the triangle 0 < a < 1, 
0 < b < a /2 onto all of 7~, proving the theorem. D 

It may be prudent to remark at this point that while the feasible region 
for 12 versus 123 density of permutons is the same as that for edge and 
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triangle density of graphs, the topography of the corresponding entropy func¬ 
tions within this region is entirely different. In the graph case the entropy 
landscape is studied in [30, ED ‘32|; one of its features is a ridge along the 
“Erdos-Renyi” curve (where triangle density is the 3/2 power of edge den¬ 
sity). There is a sharp drop-off below this line, which represents the very high 
entropy graphs constructed by choosing edges independently with constant 
probability. The graphons that maximize entropy at each point of the feasi¬ 
ble region all appear to be very combinatorial in nature: each has a partition 
of its vertices into finitely many classes, with constant edge density between 
any two classes and within any class, and is thus described by a finite list of 
real parameters. 

The permuton topography features a different high curve, representing the 
permutons (discussed above) that maximize entropy for a fixed 12 density. 
Moreover, the permutons that maximize entropy at interior points of the 
region appear, as in other regions discussed above, always to be analytic. 

We do not know explicitly the maximizing permutons (although they 
satisfy an explicit PDE, see Section |8| or the entropy function. 

10 123/321 case 

The feasible region for fixed densities p i23 versus p 32i is the same as the 
feasible region B for triangle density x = d(/i 3 , G ) versus anti-triangle density 
y = d(K 3 , G ) of graphons [T8j. Let C be the line segment x + y = \ for 0 < 
x <~ v D the x-axis from x — \ to x — 1, and E the p-axis from y = | to y = 
1. Let F\ be the curve given parametrically by (x, y) = (f 3 , (1—f) 3 +3f (1—f) 2 ), 
for 0 < t < 1, and F 2 its symmetric twin (x,y) = ((1 — t) 3 + 3t(l — t) 2 ,t 3 ). 
Then B is the union of the area bounded by C, D, E and F\ and the area 
bounded by C, D, E and F 2 . 

The curves F\ and F 2 cross at a concave “dimple” (r, r) where r = s 3 = 
(1 — s) 3 + 3s(l — s) 2 ), with s ~ .653 and r ~ .278; see Fig. [9j 

To see that B is also the feasible region for 123 versus 321 density of 
permutons, an argument much like the one above for 12 versus 123 can be 
(and was, by |10j ) given. Permutons realizing various boundary points are 
illustrated in Fig. [9j they correspond to the extremal graphons described in 
|18j . The rest are filled in by parameterization and a topological argument. 

Of note for both graphons and permutons is the double solution at the 
dimple. These solutions are significantly different, as evidenced by the fact 
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Figure 9: The feasible region for pi 23 ,P 32 i- It is bounded above by the 
parameterized curves (1 — 3f 2 + 2 t 3 , t 3 ) and (t 3 ,1 — 3 1 2 + 2 1 3 ) which intersect 
at (x, y) = (.278..., .278...). The lower boundaries consist of the axes and the 
line x + y = 1/4. 
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that their edge-densities (12 densities, for the permutons) differ. This mul¬ 
tiplicity of solutions, if there are no permutons bridging the gap, suggests a 
phase transition in the entropy-optimal permuton in the interior of B in a 
neighborhood of the dimple. In fact, we can use a stability theorem from mi 
to show that the phenomenon is real. 

Before stating the next theorem, we need a definition: two n-vertex graphs 
are e-close if one can be made isomorphic to the other by adding or deleting 
at most £ • ( 2 ) edges. 

Theorem 16 (special case of Theorems 1.1 and 1.2 of [17]). For any e > 0 
there is a 5 > 0 and an N such that for any n-vertex graph G with n > N and 
d(K 3 , G) > p and \d(K 3 , G) — M p \ < S is e-close to a graph H on n vertices 
consisting of a clique and isolated vertices, or the complement of a graph 
consisting of a clique and isolated vertices. Here M p := max ((1 — p 1 / 3 ) 3 -f 
3p 1 / 3 (l— p 1 / 3 ) 2 , (1 —g) 1 / 3 ) where q is the unique real root of q 3 +3q 2 (l — q) = p; 
that is, M p is the largest possible value of d(K 3 ,G) given d(K 3 ,G ) = p. 


From Theorem 16 we derive the following lemma. Note that there are in 
fact many permutons representing the dimple (r, r) of the feasible region for 
123 versus 321, but only two classes if we consider permutons with isomorphic 
comparability graphs to be equivalent. The class that came from the curve 
F\ has 12 density s 2 ~ .426, the other 1 — s 2 ~ .574. (Interestingly, the other 
end of the F\ curve—represented uniquely by the identity permuton—had 12 
density 1, while the F 2 class “began” at 12 density 0. Thus, the 12 densities 
crossed on the way in from the corners of B.) 


Lemma 17. There is a neighborhood of the point (r, r) in the feasible region 
for patterns 123 and 321 within which no permuton has 12-density near 

Proof. Apply Theorem 16 with £ = .07 to get 6 > 0 with the property 

stated in the theorem. Let 5' = min(<5/2, ( M r _$ — r)j 2), which yields that 

| M p — r\ < 5/2 for p e [r — 5',r\. So, if ^ 123 ( 7 ) — r\ < S' < 5/2 and 

p G [r — 5',r ], then ^ 123 ( 7 ) — M p \ < 5 as required by the hypothesis of 

Theorem [l6| (noting that 7 * 123 ( 7 ) is the triangle density of the comparability 
graph corresponding to 7 ). We conclude that any permuton 7 such that 
( 7623 ( 7 )) 7 * 321 ( 7 )) lies i n the rectangle [r — 5', r + 5'] x [r — 5', r] has 12-density 
within .07 of either .426 or .574, thus outside the range [.496, .504]. 

The symmetric argument gives the same conclusion for ^ 123 ( 7 ), 7 * 321 ( 7 )) 
in the rectangle [r — 5',r\ x [r — 5',r + <T]. Since there are no permutons 7 
with both ( 7723 ( 7 ) and 7 * 321 ( 7 ) larger than r, the lemma follows. D 
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11 Proof of Theorem U 


For completeness, we now give a proof of Theorem [T} We begin with a simple 
lemma. 

Lemma 18. The function 77 : T —> M is upper semicontinuous. 

Proof. Let 71 , 72 ,... be a sequence of permutons approaching the permuton 
7 (in the da-topology); we need to show that 77( 7 ) > lim sup 77( 7 n ). 

If 77( 7 ) is finite, fix e > 0 and take m large enough so that \H{p/ m ) — 
77 ( 7 )| < e; then since 77( 7 ™) > 77 ( 7 n ) by concavity, 

lirn sup 77 ( 7 ,,) < lim sup 77 ( 7 ™) = 77 ( 7 ™) < e + 77 ( 7 ) (75) 

n n 

and since this holds for any e > 0 , the claimed inequality follows. 

If 77( 7 ) = — 00 , fix t < 0 and take m so large that H( 7 ™) < t. Then 

lim sup H ( 7 „) < lim sup 7 /( 7 ™) = H[^ m ) < t (76) 

n n 

for all t, so lim sup n 77 ( 7 ™) —» — 00 as desired. □ 

Let B( 7 , e:) = { 7 / |da( 7 , 7 / ) < e} be the (closed) ball in T of radius e > 0 
centered at the permuton 7 , and let B n {^,e) be the set of permutations 
7 T G S n with 7 ^ e 5 ( 7 , e). 

Lemma 19. For any permuton 7 , lim^o lim^oo ^ log(| 77 n ( 7 , e)\/n\) exists 
and equals 77 ( 7 ). 

Proof. Suppose 77 ( 7 ) is finite. It suffices to produce two sets of permutations, 
U C B n {p),e) and V D 7 ^( 7 , e), each of size 

exp (nlogn — n + n(H{pf) + o(£ 0 )) + o(n)) (77) 

where by o(e°) we mean a function of e (depending on 7 ) which approaches 
0 as £ —)■ 0. (The usual notation here would be o(l); we use o(e°) here and 
later to make it clear that the relevant variable is e and not, e.g., n.) 

To define 7/, fix m > 5/e so that | 77 ( 7 m ) — 77 ( 7 )| < e and let n be a 
multiple of m with n > m 3 /e. Choose integers 77 j, 1 < i, j < m, so that: 

1 . ni t j — n/m for each j; 
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2 . J2j =i n i,j = n / m for each i\ and 

3. | n itj - n'y(Q ij )\ < 1 for every i,j. 

The existence of such a rounding of the matrix is guaranteed by 

Baranyai’s rounding lemma [ 2 J. 

Let U be the set of permutations tt £ S n with exactly n t ,j points in the 
square Q tJ , that is, |{z : (i/n,Tr(i)/n) G Qij}\ = for every 1 < i,j < m. 
We show first that U is indeed contained in B n { 7 , e). Let R = [a, b] x [c, d] be 
a rectangle in [0, l] 2 . R will contain all Q, 3 for i 0 < i < R and j 0 < j < j\ for 
suitable i 0 , R, j 0 and ji, and by construction the y^-measure of the union of 
those rectangles will differ from its 7 -measure by less than m 2 jn < e/m. The 
squares cut by R are contained in the union of two rows and two columns of 
width 1/m, and hence, by the construction of n and the uniformity of the 
marginals of 7, cannot contribute more than 4/m < 4e/5 to the difference in 
measures. Thus, finally, da (7^,7) < £/m + 4e/5 < e. 

Now we must show that \U\ is close to the claimed size 


exp (to log to — to — H(j)n) 


(78) 


We construct n G U in two phases of m steps each. In step i of Phase I, we 
decide for each k, (i—l)n/m < k < in/m , which of the m ^-intervals 7 r(k) 
should lie in. There are 



(79) 


ways to do this, where h, = — ( Tl i,]/{n/rn)) l°g(rqj/ (n/m,)) is the en¬ 

tropy of the probability distribution ni r /(n/m). 

Thus, the number of ways to accomplish Phase I is 



( 80 ) 
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Recalling that the value taken by the density g m of on the points of Qij 
is m 2, y(Qij), we have that 

H (l m ) = (-™ 2 7(Qp) lo g(ra 2 7(<3p))) 

i,j 

= + 21 ogm) 

i,j 

= + 21 ogm) 

= —2 log m — log 'yiQij) ■ (81) 

Therefore we can rewrite the number of ways to do Phase I as 

exp (n log m + nH{ r y m ) + o(n)). (82) 

In Phase II we choose a permutation 7Tj G S'n/m for each j, 1 < j < m , 
and order the ^-coordinates of the n/m points (taken left to right) in row j 
according to Tij. Together with Phase I this determines 7 r uniquely, and the 
number of ways to accomplish Phase II is 

/ / Mm ( ( U , U U / , N \\ m 

(n/m)! = exp — log-b oin/m) 

\ \m m m // 

= exp (n log n — n — n log m + o(n)) (83) 

so that in total, 

| U\ > exp (n log m + nH{ y m ) + o(n)) exp (n log n — n — n log m + o(n)) 
= exp (n log n — n + nH( y m ) + o(n)) (84) 

which, since |if( 7 ) — /h( 7 m )| < e, does the job. 

We now proceed to the other bound, which involves similar calculations 
in a somewhat different context. To define the required set V D B n ( 7 , e) of 
permutations we must allow a wide range for the number of points of n that 
fall in each square Q t] — wide enough so that a violation causes Q t] itself to 
witness (in( 7 ^-, 7 ) > £, thus guaranteeing that if 7 r ^ V then 7 r ^ B n ( 7 , 5 ). 

To do this we take m large, £ < 1/m 4 , and n > 1/e 2 . We define V to be 
the set of permutations 7 r e S n for which the number of points (k/n, n(k)/n) 
falling in Qij lies in the range — \/e), n(j(Q t j) + ^/e)]. Then, as 
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promised, if tt ^ V we have a rectangle R = Qij with ( 7 ( 7 ?) — 7 ^-(i?)| > 
\fejm 2 > e. 

It remains only to bound | V \. Here a preliminary phase is needed in which 

the exact count of points in each square Q t j is determined; since the range for 

2 

each n l ,j is of size 2n-y/e, there are at most {2n^/e) m = exp (m 2 \og(2n^/e)) 
ways to do this, a negligible factor since m 2 log(ni/y) = o(n). For Phase I we 
must assume the are chosen to maximize each hi but since the entropy 
function h is continuous, the penalty shrinks with e. Counting as before, we 
deduce that here the number of ways to accomplish Phase I is bounded by 

exp (nlogm + n(H( 7 ” 1 ) + o(£ 0 )) + o{nj) 

= exp (nlogm + n(H( 7 ) + o(k 0 )) + o(n)). (85) 

The computation for Phase II is exactly as before and the conclusion is that 

\V\ < exp (nlogm — n + n(i/( 7 ) + o(e 0 )) + o(n)) 
x exp (n log n — n — n log m + o(n)) 

= exp (??, log n — n + nH{ 7 ) + o(n)j ( 86 ) 

proving the lemma in the case where R('j) > — 00 . 

If H ( 7 ) > — 00 , we need only the upper bound provided by the set V. Fix 
t < 0 with the idea of showing that ^ log < t. Define V as above, 

first insuring that m is large enough so that H( r y m ) < t—1. Then the number 
of ways to accomplish Phase I is bounded by 

exp (nlogm+n(i/( 7 m )+o(e 0 ))+o(n)) < exp (nlogm+n(t—l+o(£°))+o(n)) 

(87) 

and consequently |V] is bounded above by 

exp (n logn — n + n(t— 1 ) + o(n)) < exp (n log n — n + nt) . ( 88 ) 

□ 

We are finally in a position to prove Theorem [lj If our set A of permutons 
is closed, then, since Y is compact, so is A. Let 5 > 0 with the idea of showing 
that 

lim — log + 5 (89) 

n—ioo n n\ 

for some /i G A. If not, for each 7 6 Awe may, on account of Lemma [19} 
choose e 7 and n 7 so that 7 log < H{ 7 ) + 5/2 for all n > n 1 . Since a 
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finite number of these balls cover A, we have too few permutations in A n for 
large enough n, and a contradiction has been reached. 

If A is open, we again let <5 > 0, this time with the idea of showing that 

lim — log -—A- > H(n) — 8 . (90) 

n-> oc n n\ 

To do this we find a permuton /i £ A with 

H (/i) > sup H ( 7 ) — 5/2 , (91) 

7£A 

and choose e > 0 and n 0 so that B n (/j,,e ) C A and ^ log ^ j > _ 

S/2 for n > iiq. 

This concludes the proof of Theorem [lj 
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